Algorithms for the Bregman k-Median problem

نویسنده

  • Marcel R. Ackermann
چکیده

In this thesis, we study the k-median problem with respect to a dissimilarity measure Dφ from the family of Bregman divergences: Given a finite set P of size n from R, our goal is to find a set C of size k such that the sum of error cost(P,C) = ∑ p∈P minc∈C { Dφ(p, c) } is minimized. This problem plays an important role in applications from many different areas of computer science, such as information theory, statistics, data mining, and speech processing. Our main contribution is the development of a general framework of algorithms and techniques that is applicable to (almost) all Bregman divergences. In particular, we give a randomized approximation algorithm for the Bregman k-median problem that computes a (1 + ε)-approximate solution using at most 2Õ(k/ε)n arithmetic operations, including evaluations of Bregman divergence Dφ. In doing so, we give the first approximation algorithm known for this problem that provides any provable approximation guarantee. We also give a fast, practical, randomized approximation algorithm that computes an O(log k)-approximate solution for arbitrary input instances, or even an O(1)-approximate solution for certain, well separated input instances. In addition to that, we study the use of coresets in the context of Bregman k-median clusterings. In a nutshell, a coreset is a small (weighted) set that features the same clustering behavior as the original input set. We show how classical coreset constructions for the Euclidean k-means problem can be adapted to a special subfamily of the Bregman divergences, namely the class of Mahalanobis distances. We also give a new, randomized coreset construction for the Mahalanobis k-median problem in low dimensional spaces that has several practical advantages. Furthermore, by introducing the notion of weak coresets, we give the first coreset construction applicable to (almost) all Bregman k-median clustering problems. Using these weak coresets, we are able to give the currently asymptotically fastest (1 + ε)-approximation algorithm known for the Bregman k-median problem. This algorithm uses at most O(kn)+2Õ(k/ε) log(n) arithmetic operations, including evaluations of Bregman divergence Dφ.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximation Algorithms for Bregman Clustering Co-clustering and Tensor Clustering

The Euclidean K-means problem is fundamental to clustering and over the years it has been intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10], and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty encountered by these clustering problems is the NP-Hardness of the associated optimization task, ...

متن کامل

Bregman Clustering for Separable Instances

The Bregman k-median problem is defined as follows. Given a Bregman divergence Dφ and a finite set P ⊆ IR of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = P p∈P minc∈C Dφ(p, c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study ...

متن کامل

Coresets and approximate clustering for Bregman divergences

We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ R of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ p∈P minc∈C { Dφ(p, c) } is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We g...

متن کامل

Hardness and Non-Approximability of Bregman Clustering Problems

We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the objective is to minimize the maximum ...

متن کامل

PERFORMANCE COMPARISON OF CBO AND ECBO FOR LOCATION FINDING PROBLEMS

The p-median problem is one of the discrete optimization problem in location theory which aims to satisfy total demand with minimum cost. A high-level algorithmic approach can be specialized to solve optimization problem. In recent years, meta-heuristic methods have been applied to support the solution of Combinatorial Optimization Problems (COP). Collision Bodies Optimization algorithm (CBO) a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009